ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters
نویسندگان
چکیده
MapReduce clusters are usually multi-tenant (i.e., shared among multiple users and jobs) for improving cost and utilization. The performance of jobs in a multitenant MapReduce cluster is greatly impacted by the allMap-to-all-Reduce communication, or Shuffle, which saturates the cluster’s hard-to-scale network bisection bandwidth. Previous schedulers optimize Map input locality but do not consider the Shuffle, which is often the dominant source of traffic in MapReduce clusters. We propose ShuffleWatcher, a new multi-tenant MapReduce scheduler that shapes and reduces Shuffle traffic to improve cluster performance (throughput and job turn-around times), while operating within specified fairness constraints. ShuffleWatcher employs three key techniques. First, it curbs intra-job Map-Shuffle concurrency to shape Shuffle traffic by delaying or elongating a job’s Shuffle based on the network load. Second, it exploits the reduced intra-job concurrency and the flexibility engendered by the replication of Map input data for fault tolerance to preferentially assign a job’s Map tasks to localize the Map output to as few nodes as possible. Third, it exploits localized Map output and delayed Shuffle to reduce the Shuffle traffic by preferentially assigning a job’s Reduce tasks to the nodes containing its Map output. ShuffleWatcher leverages opportunities that are unique to multi-tenancy, such overlapping Map with Shuffle across jobs rather than within a job, and trading-off intra-job concurrency for reduced Shuffle traffic. On a 100-node Amazon EC2 cluster running Hadoop, ShuffleWatcher improves cluster throughput by 39-46% and job turn-around times by 27-32% over three state-of-the-art schedulers.
منابع مشابه
Dynamic Scheduling of MapReduce Shuffle under Bandwidth Constraints
Whether it is for e-science or business, the amount of data produced every year is growing at a high rate. Managing and processing those data raises new challenges. MapReduce is one answer to the need for scalable tools able to handle the amount of data. It imposes a general structure of computation and let the implementation perform its optimizations. During the computation, there is a phase c...
متن کاملResource-Aware Adaptive Scheduling for MapReduce Clusters
We present a resource-aware scheduling technique for MapReduce multi-job workloads that aims at improving resource utilization across machines while observing completion time goals. Existing MapReduce schedulers define a static number of slots to represent the capacity of a cluster, creating a fixed number of execution slots per machine. This abstraction works for homogeneous workloads, but fai...
متن کاملAn Adaptive Framework for Managing Heterogeneous Many-Core Clusters
The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as IBM Cell an...
متن کاملResearch on Multi-Tenant Distributed Indexing for SaaS Application
Multi-tenant is the key feature for SaaS application, however, the traditional indexing mechanism has failed in multi-tenant shared scheme database. This paper proposed a multi-tenant distributed indexing mechanism. We create a global index first and then create the local index by MapReduce framework based on Hadoop. We also proposed the process of index update and index merging. Experimental r...
متن کاملScheduling MapReduce Jobs and Data Shuffle on Unrelated Processors
We propose constant approximation algorithms for generalizations of the Flexible Flow Shop (FFS) problem which form a realistic model for non-preemptive scheduling in MapReduce systems. Our results concern the minimization of the total weighted completion time of a set of MapReduce jobs on unrelated processors and improve substantially on the model proposed by Moseley et al. (SPAA 2011) in two ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014